Skip to content

Conversation

@mateuszpn
Copy link
Contributor

@mateuszpn mateuszpn commented Nov 21, 2025

Fix llvm-lit to enable the simultaneous use of _v1 and _v2 to select the l0 adapter and the use of arch to select the GPU, eg.:
llvm-lit --param "sycl_devices=level_zero_v2:arch-intel_gpu_mtl_u" sycl/test-e2e
Necessary to run tests on integrated GPU on system with another GPU
Includes fixed setting of ZE_AFFINITY_MASK in multi-gpu systems

@mateuszpn mateuszpn requested a review from a team as a code owner November 21, 2025 15:33
@aelovikov-intel aelovikov-intel changed the title [SYCL][e2e-tests] fix lit for arch selection and l0v2 adapter [SYCL][E2E] Fix lit for arch selection and l0v2 adapter Nov 21, 2025
Comment on lines 337 to 341
backend, device = parsed_dev_name.split(":", 1)
device_selector = parsed_dev_name
if backend == "level_zero" and device.isdigit():
extra_env.append(f"ZE_AFFINITY_MASK={device}")
device_selector = f"{backend}:0"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs a comment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some local lit configs, eg. sycl/test-e2e/AddressSanitizer/lit.local.cfg set affinity mask to 0. It needs to be overridden if we do not use level_zero:0 device. Comment added in the code

Copy link
Contributor

@aelovikov-intel aelovikov-intel Nov 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why should we do that here and not fix AddressSanitizer? IMO, this file should just save {device} somewhere in the config and the AddressSanitizer/lit.local.cfg should use that for setting ZE_AFFINITY_MASK.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh sorry I missed that this was specifically for asan.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aelovikov-intel Setting of ZE_AFFINITY_MASK moved to local configs.

Copy link
Contributor

@aelovikov-intel aelovikov-intel Nov 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've taken a deeper look at what's going on (and + @uditagarwal97 ). From what I understand, this ZE_AFFINITY_MASK issue is unrelated to your change (could be reproduced with level_zero:arch-<smth> without explicit v1/v2). Can you clarify why are you making this change here at all?

"opencl": ("cpu", "gpu", "fpga"),
"cuda": "gpu",
"level_zero": "gpu",
"level_zero": ("gpu", "0", "1"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you trying to allow llvm-lit --param sycl_devices=level_zero:0? We intentionally decided to disallow that in review of #18197

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although the arch- setting ends up pointing at particular device, eg. level_zero:1, we do not do it in the params. Removed.

expanded = "env"

extra_env = get_extra_env([parsed_dev_name])
backend, device = parsed_dev_name.split(":", 1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The example of the intended lit string after the change in the PR description is

llvm-lit --param "sycl_devices=level_zero_v2:arch-intel_gpu_mtl_u" sycl/test-e2e

but I was expecting something like

llvm-lit --param "sycl_devices='level_zero_v1:gpu;level_zero_v2:gpu" sycl/test-e2e

Can you clarify on the intended use case? Thanks

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The goal is to force tests to run on specified device, eg. integrated gpu, even when other devices are present. The option was available before (see lit.cpg.py:901). Although it is rarely used, it is still perfect for testing integrated GPUs. Unfortunately, due to a bug, it did not work with the _v1 or _v2 postfixes in the backend name. The PR fixes this bug.

Copy link
Contributor

@sarnex sarnex left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't comment if ZE_AFFINITY_MASK is the right way to pin it to a specific device but other than that lgtm, thanks!

Signed-off-by: Mateusz P. Nowak <mateusz.p.nowak@intel.com>
features.update(sg_size_features)
features.update(architecture_feature)
features.update(device_family)
features.update(aspects)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this line? Isn't 1120 enough?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When using arch-selection, the device is not :gpu and 'gpu' is not added to the features. The questioned line solves this in simple way, but with overhead - adds more then only 'gpu'. Replaced it with clean solution

Comment on lines 12 to 14
import lit.formats
import lit.util

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrelated.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Restored, but why to keep unused imports?

config.unsupported=True

config.environment["ZE_AFFINITY_MASK"] = "0"
if hasattr(config, 'ze_affinity_mask') and config.ze_affinity_mask is not None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You have

# Determine ZE_AFFINITY_MASK for Level Zero devices.
# Sanitizer tests need to set ZE_AFFINITY_MASK to a single device index
config.ze_affinity_mask = None

below, is hasattr check needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, not. Fixed

Comment on lines 36 to 37
else:
config.environment["ZE_AFFINITY_MASK"] = "0"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is else really needed? IIUC, config.ze_affinity_mask would be set for L0 devices, and for CPU this env. variable is meaningless anyway.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

Comment on lines 354 to 358
# If ZE_AFFINITY_MASK is set, it filters devices so we should use :0
device_selector = parsed_dev_name
if "ZE_AFFINITY_MASK" in test.config.environment:
backend, _ = parsed_dev_name.split(":", 1)
device_selector = f"{backend}:0"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I recommend this way, having code trying to play nice with existing envvars seems bad

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Anyway, we need to check if affinity mask was set by local cfg. Changed, and added to possibly dangerous list.

Signed-off-by: Mateusz P. Nowak <mateusz.p.nowak@intel.com>
Copy link
Contributor

@zhaomaosu zhaomaosu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sanitizer changes lgtm

@mateuszpn
Copy link
Contributor Author

@intel/llvm-gatekeepers Please consider merging. The issue has nothing to do with modified scripts and is tracked under #20774

@steffenlarsen steffenlarsen merged commit 046f67d into intel:sycl Nov 28, 2025
35 of 38 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants